Efficient End-to-End Sentence-Level Lipreading with Temporal Convolutional Networks
نویسندگان
چکیده
Lipreading aims to recognize sentences being spoken by a talking face. In recent years, the lipreading method has achieved high level of accuracy on large datasets and made breakthrough progress. However, is still far from solved, existing methods tend have error rates wild data defects disappearing training gradient slow convergence. To overcome these problems, we proposed an efficient end-to-end sentence-level model, using encoder based 3D convolutional network, ResNet50, Temporal Convolutional Network (TCN), CTC objective function as decoder. More importantly, architecture incorporates TCN feature learner decode feature. It can partly eliminate RNN (LSTM, GRU) disappearance insufficient performance, this yields notable performance improvement well faster Experiments show that convergence speed are 50% than state-of-the-art method, improved 2.4% GRID dataset.
منابع مشابه
LipNet: End-to-End Sentence-level Lipreading
Lipreading is the task of decoding text from the movement of a speaker’s mouth. Traditional approaches separated the problem into two stages: designing or learning visual features, and prediction. More recent deep lipreading approaches are end-to-end trainable (Wand et al., 2016; Chung & Zisserman, 2016a). However, existing work on models trained end-to-end perform only word classification, rat...
متن کاملEnd-to-End Multi-View Lipreading
Non-frontal lip views contain useful information which can be used to enhance the performance of frontal view lipreading. However, the vast majority of recent lipreading works, including the deep learning approaches which significantly outperform traditional approaches, have focused on frontal mouth images. As a consequence, research on joint learning of visual features and speech classificatio...
متن کاملEnd-to-End Kernel Learning with Supervised Convolutional Kernel Networks
In this paper, we introduce a new image representation based on a multilayer kernelmachine. Unlike traditional kernel methods where data representation is decoupledfrom the prediction task, we learn how to shape the kernel with supervision. Weproceed by first proposing improvements of the recently-introduced convolutionalkernel networks (CKNs) in the context of unsupervised lear...
متن کاملLCANet: End-to-End Lipreading with Cascaded Attention-CTC
Machine lipreading is a special type of automatic speech recognition (ASR) which transcribes human speech by visually interpreting the movement of related face regions including lips, face, and tongue. Recently, deep neural network based lipreading methods show great potential and have exceeded the accuracy of experienced human lipreaders in some benchmark datasets. However, lipreading is still...
متن کاملLipNet: Sentence-level Lipreading
Lipreading is the task of decoding text from the movement of a speaker’s mouth. Traditional approaches separated the problem into two stages: designing or learning visual features, and prediction. More recent deep lipreading approaches are end-to-end trainable (Wand et al., 2016; Chung & Zisserman, 2016a). All existing works, however, perform only word classification, not sentence-level sequenc...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Applied sciences
سال: 2021
ISSN: ['2076-3417']
DOI: https://doi.org/10.3390/app11156975